6

1

RAG Workshop - Forward and Beyond

Estimated time: 30 minutes

by Kevin Lin, Andreas Mueller

2

Current State: Proof of Concept (POC)

3

Product vs. Custom Solution

| Use a product if... | Build a custom solution if... | |----------------------------------------|----------------------------------------------------| | ... data is already at the right place | ... need to integrate data sources | | ... it works well for your use case | ... out-of-the-box products don’t work well enough | | ... the licensing costs are acceptable | ... development- and operation-costs are cheaper |

right, fit
4

The Path Forward

  • Turning the POC into a scalable product.
  • Exploring multi-agent architectures and advanced chunking for better retrieval and response.
  • Leveraging the feedback loop to continuously improve the model
  • Enhance feature set according to use case
5

Zühlke has its own Sales Accelerator

  • Built based on UNIQA source code, with fully customizable UI.
  • Able to spin up a new instance in hours
6

💡 Topic: Evaluation

Measure Response Quality

  • Add an evaluation framework to assess the quality of responses.
  • Key metrics for evaluation:
    • Retrieval accuracy
    • Response quality
    • User satisfaction
7

💡 Topic: Evaluation

Automated LLM responses validation is difficult

Each answer is different, standard assertions are tricky. Extracting meaning with help of LLMs not good enough (yet)

  • Approaches to consider:
  • Evaluate chunk retrieval accuracy (no LLM validation needed)
  • Lab testing setup - generate report for manual validation
  • RAGAS: Retrieval-Augmented Generation Assessment System.
8

💡 Topic: Improve Chunking

Clean up your documents

Preprocess documents by removing non-content information (headers, footers, tables).

  • Remove Header & Footer, irrelevant
  • Remove table of contents, creates confusing chunks
  • Remove or convert tables, LLMs can't understand table content by default
  • Remove images, LLMs can't process content hidden in images

Structured formats work best!

9

Chunking heavily impacts retrieval performance

  • Find best chunk size:
  • Chunks too small: miss context
  • Chunks too large: embeddings too generic
  • Hyperparameter (same as overlap)
  • Smart splitting of chunks:
  • Adaptive chunk size based on content (e.g. embedding distance)
  • Use document structure (sections / paragraphs / …)
  • Add more context to chunks:
  • File-name, title of document, name of (sub-)chapter
    • Can even ask LLM to output exact reference to answer then
  • Text before & after (but not part of embedding)

> https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5

right, fit
10

💡 Topic: Improve Chunking

Improve response quality by improve chunking

  • Experiment with different chunk sizes.
  • Use different chunking algorithms (e.g., character-based, overlapping).
  • Context-based chunking (manual adjustments based on document structure).
11

💡 Topic: Improve Searching

Combine keyword- and vector-search

  • Perform a hybrid search that combines semantic vector search and traditional keyword search.
  • Whereas embeddings usually outperform keywords, their combination can be even more powerful

> Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities

right, fit
12

💡 Topic: Improve Searching - Continued

  • Query transformation
  • Reformulate question / create multiple questions out of it, search with all of them
  • Use meta-data of files
  • E.g. consider date of document in re-ranking
  • Hierarchical search
  • First search in document summaries, then within relevant documents only
  • Semantic Reranking
  • Rerank results based on the context, to promote results that fit query intent
13

💡 Topic: Multi-Agent Approach

Problem:

Your RAG application has policies for different regions stored. The policies have a high overlap on the vector map, as they cover similar topics. The vector search struggles to retrieve the correct chunks based on the region requested.

Solution: Use Multi-Agent Setup

  • Implement a multi-agent approach to manage regional context.
  • Introduce a main agent that determines the requested region.
  • Use function-responses to redirect questions to the appropriate regional agents.
14

Incorporate agent in the retrieval process

Approach also known as “query routing”

original, inline
15

That's just the beginning

There is much more to explore

  • Conversation awareness
  • Prompt engineering
  • Iterative Search
  • Hypothetical questions
  • Performance/cost optimizations
  • ...
16

Questions?

17

What's next?

  • If you are interested: Improve your solution further!
  • Pick and choose from the topics
  • Good start could be Bonus Task: Evaluation or Bonus Task: Chunking
  • Reach out to us if you'd like to access the resources beyond the camp.
18